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Components of SAP Data Services 


LESSON OBJECTIVES 


After completing this lesson, you will be able to: 


e Understand the components of SAP Data Services 


What is SAP Data Services? 


e Single development user interface 





e Metadata repository 

e Data connectivity layer 

e Runtime environment 

e Management Console 

SAP Data Services delivers a single enterprise-class solution for data integration, data quality, 


data profiling, and text data processing. 


Businesses can use SAP Data Services to integrate, transform, improve, and deliver trusted 
data to critical business processes. IT organizations can depend on SAP Data Services for 
maximum operational efficiency to improve data quality and gain access to heterogeneous 
sources and applications. SAP Data Services provides all of these features using the elements 
listed above. 
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SAP Data Services and the SAP Solution Portfolio 
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Data Sources: Structured and Unstructured 
Figure 1: SAP Data Services and the SAP Solution Portfolio 





The SAP solution portfolio delivers extreme insight through specialized end-user tools on a 
single, trusted business intelligence (BI) platform. This entire platform is supported by SAP 
Data Services. On top of SAP Data Services, the SAP solution portfolio layers the most 
reliable, scalable, flexible, and manageable business intelligence platform which supports the 
industry's best integrated end-user interfaces: reporting, query and analysis, and 
performance management dashboards, scorecards, and applications. 


True data integration blends batch extraction, transformation, and loading (ETL) technology 
with real-time bi-directional data flow across multiple applications for the extended 
enterprise. 


By building a relational datastore and intelligently blending direct real-time and batch data- 
access methods to access data from enterprise resource planning (ERP) systems and other 
sources, SAP has created a powerful, high-performance data integration product that allows 
you to fully leverage your ERP and enterprise application infrastructure for multiple uses. 


SAP provides a batch and real-time data integration system to drive today's new generation 
of analytic and supply-chain management applications. Using the highly scalable data 
integration solution provided by SAP, your enterprise can maintain a real-time online dialogue 
with customers, suppliers, employees, and partners, providing them with the critical 
information they need for transactions and business analysis. 


SAP Data Services Components 


Data Services Designer 


The Designer is a development tool with an easy-to-use graphical user interface. It enables 
developers to define data management applications that consist of data mappings, 
transformations, and control logic. 


Use the Designer to create applications containing work flows (job execution definitions) and 
data flows (data transformation definitions). 
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SAP Data Services Components 


Data Services Designer 


The Designer is a development tool with an easy-to-use graphical user interface. It enables 
developers to define data management applications that consist of data mappings, 
transformations, and control logic. 


Use the Designer to create applications containing work flows (job execution definitions) and 
data flows (data transformation definitions). 
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Unit 1: What is SAP Data Services? 


To use the Designer, create objects, then drag, drop, and configure them by selecting icons in 
flow diagrams, table layouts, and nested workspace pages. The objects in the Designer 
represent metadata. The Designer interface allows you to manage metadata stored ina 
repository. From the Designer, you can also trigger the job server to run your jobs for initial 
application testing. 


Repositories 


The SAP Data Services repository is a set of tables that hold user-created and predefined 
system objects, source and target metadata, and transformation rules. Set up repositories on 
an open client/server platform to facilitate sharing metadata with other enterprise tools. Each 
repository must be stored on an existing RDBMS and registered in the Central Management 
Console (CMC). 


Each repository is associated with one or more job servers which run the jobs you create. 
There are two types of repositories: 


e Local repository 


A local repository is used by an application designer to store definitions of objects (like 
projects, jobs, work flows, and data flows) and source/target metadata. 


e Central repository 


A central repository is an optional component that can be used to support multi-user 
development. The central repository provides a shared object library allowing developers 
to check objects in and out of their local repositories. 


While each user works on applications in a unique local repository, the team uses a central 
repository to store the master copy of the entire project. The central repository preserves all 
versions of an application's objects, so you can revert to a previous version if needed. 


Multi-user development includes other advanced features such as labeling and filtering to 
provide you with more flexibility and control in managing application objects. 


Job Server 


The SAP Data Services job server starts the data movement engine that integrates data from 
multiple heterogeneous sources, performs complex data transformations, and manages 
extractions and transactions from ERP systems and other sources. The job server can move 
data in either batch or real-time mode and uses distributed query optimization, multi- 
threading, in-memory caching, in-memory data transformations, and parallel processing to 
deliver high data throughput and scalability. 


While designing a job, you can run it from the Designer which tells the job server to run the 
job. The job server retrieves the job from its associated repository, then starts an engine to 
process the job. In your production environment, the job server runs jobs triggered by a 
scheduler or by a real-time service managed by the access server. In production 
environments, you can balance job loads by creating a job server group (multiple job servers) 
which executes jobs according to overall system load. 


Engine 


When Data Services jobs are executed, the job server starts engine processes to perform data 
extraction, transformation, and movement. The engine processes use parallel processing and 
in-memory data transformations to deliver high data throughput and scalability. 


Service 


© Copyright. All rights reserved. 4 SAPA 
® 


Unit 1: What is SAP Data Services? 


To use the Designer, create objects, then drag, drop, and configure them by selecting icons in 
flow diagrams, table layouts, and nested workspace pages. The objects in the Designer 
represent metadata. The Designer interface allows you to manage metadata stored ina 
repository. From the Designer, you can also trigger the job server to run your jobs for initial 
application testing. 


Repositories 


The SAP Data Services repository is a set of tables that hold user-created and predefined 
system objects, source and target metadata, and transformation rules. Set up repositories on 
an open client/server platform to facilitate sharing metadata with other enterprise tools. Each 
repository must be stored on an existing RDBMS and registered in the Central Management 
Console (CMC). 


Each repository is associated with one or more job servers which run the jobs you create. 
There are two types of repositories: 


e Local repository 


A local repository is used by an application designer to store definitions of objects (like 
projects, jobs, work flows, and data flows) and source/target metadata. 


e Central repository 


A central repository is an optional component that can be used to support multi-user 
development. The central repository provides a shared object library allowing developers 
to check objects in and out of their local repositories. 


While each user works on applications in a unique local repository, the team uses a central 
repository to store the master copy of the entire project. The central repository preserves all 
versions of an application's objects, so you can revert to a previous version if needed. 


Multi-user development includes other advanced features such as labeling and filtering to 
provide you with more flexibility and control in managing application objects. 


Job Server 


The SAP Data Services job server starts the data movement engine that integrates data from 
multiple heterogeneous sources, performs complex data transformations, and manages 
extractions and transactions from ERP systems and other sources. The job server can move 
data in either batch or real-time mode and uses distributed query optimization, multi- 
threading, in-memory caching, in-memory data transformations, and parallel processing to 
deliver high data throughput and scalability. 


While designing a job, you can run it from the Designer which tells the job server to run the 
job. The job server retrieves the job from its associated repository, then starts an engine to 
process the job. In your production environment, the job server runs jobs triggered by a 
scheduler or by a real-time service managed by the access server. In production 
environments, you can balance job loads by creating a job server group (multiple job servers) 
which executes jobs according to overall system load. 


Engine 


When Data Services jobs are executed, the job server starts engine processes to perform data 
extraction, transformation, and movement. The engine processes use parallel processing and 
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Lesson: Components of SAP Data Services 


The Data Services service is installed when job and access servers are installed. The service 
starts job servers and access servers when you restart your system. The Windows service 
name is SAP Data Services. The UNIX equivalent is a daemon named AL_JobService. 


Access Server 


The SAP Data Services access server is a real-time, request-reply message broker that 
collects message requests, routes them to a real-time service, and delivers a message reply 
within a user-specified time frame. The access server queues messages and sends them to 
the next available real-time service across any number of computing resources. This 
approach provides automatic scalability because the access server can initiate additional 
real-time services on additional computing resources if traffic for a given real-time Service is 
high. You can configure multiple access servers. 


Adapter SDK 


The SAP Data Services Adapter SDK provides a Java platform for rapid development of 
adapters to other applications and middleware products. Adapters use industry-standard 
XML and Java technology to ease the learning curve. Adapters provide all necessary styles of 
interaction including the following: 


e Reading, writing, and request-reply from SAP Data Services to other systems 
e Request-reply from other systems to SAP Data Services 


License Manager 


License Manager can be used only in command-line mode. You can use it to manage your 
product activation keycodes—the alphanumeric codes that are referred to each time that you 
run certain software. By using License Manager, you can view, add, and remove product 
activation keycodes for SAP solution portfolio software (such as SAP Data Services) that 
require them. 


License Manager accesses keycodes on the local system only; you cannot access the 
keycodes from a remote system. When updating keycodes, make the changes on all SAP Data 
Services computers by launching License Manager on each computer, including Designer and 
job server computers. 


Server Manager 


The Server Manager allows you to add, delete, or edit the properties of job servers and access 
servers. It is automatically installed on each computer on which you install a job server or 
access server. 


Use the Server Manager to define links between job servers and repositories. You can link 
multiple job servers on different machines to a single repository (for load balancing) or each 
job server to multiple repositories (with one default) to support individual repositories 
(separating test from production, for example). 


The Server Manager is also where you specify SMTP server settings for the smtp_to email 
function. Also add or edit System Landscape Directory (SLD) registration information for SAP 
Data Services registration in the SLD tab of the Server Manager. 


LESSON SUMMARY 


You should now be able to: 


e Understand the components of SAP Data Services 
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LESSON SUMMARY 
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Architecture of SAP Data Services 


LESSON OBJECTIVES 


After completing this lesson, you will be able to: 


e Understand the architecture of SAP Data Services 


Architecture: Standard Components 
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A Figure 2: SAP Data Services Architecture - Standard Components 


SAP Data Services is designed for high performance across a broad spectrum of user and 
deployment scenarios. Developers can integrate SAP Data Services into your organization's 
other technology systems by using web services, Java, or .NET application programming 
interfaces (APIs). 





End users can access, create, edit, and interact with SAP Data Services projects and reports 
using specialized tools and applications that include: 


e Designer 
e Management Console 


IT departments can use data and system management tools that include: 
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e Central Management Console (CMC) 
e Management Console 

e Server Manager 

e Repository Manager 


To provide flexibility, reliability, and scalability, SAP Data Services components can be 
installed on one or across many machines. 


Server processes can be "vertically scaled" (where one computer runs several, or all, server- 
side processes) to reduce cost, or "horizontally scaled" (where server processes are 
distributed between two or more networked machines) to improve performance. It is also 
possible to run multiple, redundant versions of the same server process on more than one 
machine, so that processing can continue if the primary process encounters a problem. 
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A Figure 3: Data Services Architecture 





You can distribute software components across multiple computers, subject to the following 
rules: 


¢« Engine processes run on the same computer as the job server that spawns them. 
e Adapters require a local job server. 


Distribute components across a number of computers to best support the traffic and 
connectivity requirements of your network. You can create a minimally distributed system 
designed for development and testing, or a highly distributed system that can scale with the 
demands of a production environment. 
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SAP Integration 
SAP Data Services integrates with your existing SAP infrastructure with these common tools: 


SAP tool Description 


SAP System The system landscape directory of SAP NetWeaver is the central source of 

Landscape system landscape information relevant for the management of your software 

Directory life-cycle. By providing a directory comprising information about all installa- 

(SLD) ble software available from SAP and automatically updated data about sys- 
tems already installed in a landscape, you get the foundation for tool support 
to plan software life-cycle tasks in your system landscape. 


The SAP Data Services installation program registers the vendor and prod- 
uct names and versions with the SLD, as well as server and front-end compo- 
nent names, versions, and location. 


SAP Solution The SAP Solution Manager is a platform that provides the integrated con- 
Manager tent, tools, and methodologies to implement, support, operate and monitor 
an organization's SAP and non-SAP solutions. 


Non-SAP software with an SAP-certified integration is entered into a central 
repository and transferred automatically to your SAP System Landscape Di- 
rectories (SLD). SAP customers can then easily identify which version of 
third-party product integration has been certified by SAP within their SAP 
system environment. This service provides additional awareness for third- 
party products besides our online catalogs for third-party products. 


SAP Solution Manager is available to SAP customers at no extra charge, and 
includes direct access to SAP support and SAP product upgrade path infor- 
mation. 


CTS Trans- The Change and Transport System (CTS) helps you to organize develop- 

port (CTS+) ment projects in ABAP Workbench and in Customizing, and then transport 
the changes between the SAP systems in your system landscape. As well as 
ABAP objects, you can also transport Java objects (J2EE, JEE) and SAP-spe- 
cific non-ABAP technologies (such as Web Dynpro Java or SAP NetWeaver 
Portal) in your landscape. 


Monitoring CA Wily Introscope is a web application management product that delivers 

with CA Wily the ability to monitor and diagnose performance problems that may occur 

Introscope within Java-based SAP modules in production, including visibility into cus- 
tom Java applications and connections to back-end systems. It allows you to 
isolate performance bottlenecks in NetWeaver modules including individual 
Servlets, JSPs, EJBs, JCO’s, Classes, Methods and more. It offers real-time, 
low-overhead monitoring, end-to-end transaction visibility, historical data for 
analysis or capacity planning, customizable dashboards, automated thresh- 
old alarms, and an open architecture to extend monitoring beyond NetWea- 
ver environments. 





LESSON SUMMARY 


You should now be able to: 


e Understand the architecture of SAP Data Services 
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Lesson 1 


Installation Prerequisites 


Lesson 2 


Installing Information Platform Services or BI Platform 


Lesson 3 


Installing SAP Data Services 


UNIT OBJECTIVES 


e Understand installation prerequisites 


e Install Information Platform Services or BI Platform 


«e Install SAP Data Services 
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Installation Prerequisites 


LESSON OBJECTIVES 


After completing this lesson, you will be able to: 


e Understand installation prerequisites 


Installation Order 





Figure 4: Order of Installation 


Either the SAP BusinessObjects BI Platform (formerly known as BusinessObjects Enterprise) 


OR Information Platform Services (IPS) must be installed prior to the installation of SAP Data 
Services. 


IPS is a limited version of the BI Platform. It includes the only the components needed to 
support SAP Data Services. IPS is intended for clients who have not purchased a "full" version 
of the BI platform, or who wish to keep their SAP Data Services landscape separate from their 


BI deployment. Information Steward can be installed after SAP Data Services has been 
successfully installed. 
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BIP or IPS License 


When you install SAP Data Services on top of Business Intelligence platform (BI platform), the BI 
licensing model is used when connecting to the Central Management Server (CMS). For example, if 
you have ten BI named user licenses, these named user licenses are also shared with SAP Data 
Services. This means you can create only ten users in the CMS and at any point in time have ten 


CMS sessions. To take advantage of unlimited user licenses when connecting to the CMS, install 
SAP Data Services on top of SAP BusinessObjects Information platform services (IPS). 


A Figure 5: BI Platform or Information Platform Services (IPS)? 


For more information about BI Platform user licenses, see SAP Note 2176896. 





Another consideration involves the metadata management features of SAP Data Services and 
Information Steward. Data lineage and impact analysis is simplified when SAP Data Services 
shares the CMS of your organization’s BI landscape. 


— LESSON SUMMARY 
You should now be able to: 


e Understand installation prerequisites 
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Installing Information Platform Services or BI 
Platform 


my LESSON OBJECTIVES 
After completing this lesson, you will be able to: 


¢ Install Information Platform Services or BI Platform 


BI Platform Installation 


The planning process flow involves the following steps: 
Check installation prerequisites. 
Choose a database server. 
Choose a web application server. 
Prepare for installation. 


Verify installation. 


ra Figure 6: Installation of BI Platform 


The BI platform (or Information Platform Services, IPS) can be installed on Windows, Unix, or Linux 
platforms. 


Before installing, ensure the operating system, application server, database server, and other components 
on which you will install the BI platform are supported. For more information, see the SAP BusinessObjects 
BI 4.2 Product Availability Matrix (PAM) at https://support.sap.com/content/dam/library/ssp/infopages/pam- 


essentials/SBOP_BI_42. pdf. 


Figure 7: Installation Prerequisites 


Decide whether to use the included Sybase SQL Anywhere database server for the CMS and auditing 
databases. 


If you do not have a database server to use with the BI platform, the installation program can install and 
configure one for you. Itis recommended that you evaluate your requirements against information from your 
database server vendor to determine which supported database would best suit your organization's 


requirements. 


¿Figure 8: Database Server 
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Note: 
If you do not plan to use the default database that is included in the installation 
program, ensure the database that you plan to use Is configured before beginning 


the installation. The database must have user accounts with the appropriate 
database privileges ready, and the appropriate drivers must be installed and 
verified as working. The installation program will connect to and initialize the 
database. 





The installation program will only install a database on the local machine. It cannot install 
across a network. 


BI platform provides Tomcat as the default Java Web application server as an option during the 
installation process. If you will be using a different Web application server then it must be operational 
and available when you run the installation program. 


For a complete list of supported Web application servers, see the Product Availability Matrix (PAM). 
Figure 9: Web Application Server 


The "Management Console" option installs only web applications to a supported Java web 
application server. This option is useful for deploying web applications to nodes in a web 
application server cluster. 





Ensure that sufficient disk space is available. Allow for both the operating system and the 
software to grow over time as patches or new components become available. 


Gather the installation media or download the latest release and any patches or Support 
Packages from the SAP Service Marketplace. 


lf you plan to use SAP System Landscape Directory (SLD), ensure that the SAP Host Agent is 
installed before installing the BI platform. 


Decide the values for options you will set during the installation process. In most cases, you can 
accept the default values. More advanced installations require that you plan the installation 
process. 


installation Item Description 


Destination Folder Where to be installed 


Database Server Install SQL Anywhere? Otherwise, type, 
connection, and authentication details 
Server inteligence Agent (SIA) 


` rf 


Figure 11: Required Installation Information 
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Verify your installation by logging into the system using the Central Management Console (CMC). 


http://<WAS_HOSTNAME>:<PORT>/BOE/CMC 


Enter the Administrator as user and use the password selected during installation 


— Figure 12: Post Installation 


Note: 
The default Tomcat web application server's listening port number is 8080. 





For more information on using the CMC, see the SAP BusinessObjects Business Intelligence 
Platform Administrator Guide. 


TE LESSON SUMMARY 
You should now be able to: 


e Install Information Platform Services or BI Platform 
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Installing SAP Data Services 


LESSON OBJECTIVES 
After completing this lesson, you will be able to: 


e Install SAP Data Services 


Installation Methods 
The following installation methods are available: 


e Interactive (with default configuration) 


This type of installation creates a repository and registers it with both the job server and 
CMS. Choosing to install SAP Data Services using the default configuration enables you to 
log on to the Designer and execute jobs immediately after installing. 


e Interactive (without configuration) 


During this type of installation, the installation program does not present the screens to 
create a local repository or set up the repository database connection. After installing 
without configuration, you must ensure that the Adaptive Processing Server (APS) 
Services, which are required for Data Services and Information Steward product usage, 
are deployed to at least one CMS in your landscape. 


e Silent Installation 


A silent installation is when you install SAP Data Services using the command line. 
Installation options can be given directly in the command line as a parameter, or can be 
stored in a response file. 


Typically, you would choose to install SAP Data Services without configuration, to install the 
SAP Data Services client feature on Windows, or to install the job server standalone feature 
ona Windows or Unix/Linux server 


Planning Installation 
The planning process flow involves the following steps: 


e Determine system requirements 
e Setup account permissions 

e Determine network permissions 
e Choose a web application server 


e Choose a database server 


System Requirements 


When you install SAP Data Services on a local drive, use the following guidelines: 
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e Before you run the installation program, ensure that the destination partition has enough 
room for the deployment to expand (when updates and new features are added in the 
future). 


e If you want to install the deployment on the operating system partition, ensure that there is 
enough room for the deployment and the operating system. 


e Ifyou have previously installed any SAP products, the installation program will use the 
existing directory. 


e Before you install SAP Data Services, ensure that your host systems meet all software 
dependency requirements. 


e The 64-bit Designer and job server cannot coexist with Microsoft Office products earlier 
than Microsoft Office 2010 


For acomplete list of supported operating systems and hardware requirements, see the 


Product Availability Matrix (PAM) at https://support.sap.com/release-upgrade- 
maintenance/pam.html. 


Required Account Permissions 


Required Permissions 
31 AC nistfFatc 

Network connectivity through 
appropriate ports to all host 
systems in the deployment. 
Access to shared file system 
directories for users of the 
deployment. 
Appropriate network 
authentication privileges 





; Figure 13: Required Account Permissions 


If you are installing SAP Data Services on a Windows 7 host system that has User Account 
Control (UAC) enabled, run the installation program with the host system built-in 
administrator account. If you use a normal account, a UAC prompt appears. 


Network Permissions 


When you install SAP Data Services across multiple host systems, ensure your network 
functions properly by using the following guidelines: 


e Each host system must be able to communicate with the Central Management Server 
(CMS). The CMS coordinates the functioning of all the servers in the deployment. 


e Each host system must be able to communicate with the host that runs the repository 
database. 


e Each client, such as the Designer, must be able to communicate with the job server or 
servers. 
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e Each host system must use a fixed hostname. Fully qualified hostnames are supported. 


Ensure that deployment hostnames do not include any of the following characters: 
underscore (_), period (.), backslash (\), or forward-slash (/). 


Port Assignments 


For each of your host systems, verify that all ports to be used by SAP Data Services 
components are available and not in use by other programs. 


For a development system, you can install many components on the same host. Installing on 
a single host simplifies many connections between components (the host name is always the 
same), but you must still define connections based on the TCP/IP protocol. 


If your servers are protected by a firewall, you may need to open the necessary ports to allow 
the client components to communicate with the servers. 


Server Component Ports 


Component Port Type Default Port 

| SVE i icati 
Request Dynamic 
Access Server Communication 4001 


Metadata Browsing Communication 
(JMX Connector) 


View Data Communication 
(JMX Connector) 


Use the Data Services Designer to configure fixed debugger and job server request ports. 





Client Component Ports 


Data Services Designer to Port Type Default Port 


Request yam Žž 
Request yng Žž 


Web Application Server 8080 
(for Management Console) 


Use the Central Management Console (CMC) to configure a fixed CMS request port 
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Central Management Server 


Before you can install SAP Data Services, you must have a working SAP BusinessObjects BI 
platform Central Management Server (CMS). SAP Data Services relies on the CMS for the 
following: 


e Centralized user and group management 

e Flexible authentication methods 

e Password enforcement policies 

e Administrative housekeeping services 

e RFC server hosting 

e Services for integrating other SAP software 


If you do not have a SAP BusinessObjects BI platform installation, the basic CMS functions 
required by SAP Data Services can be provided by SAP BusinessObjects Information platform 
services (IPS). 

Web Application Server 

Several SAP Data Services functions require integration with a Java web application server 


The SAP BusinessObjects BI platform and Information Platform Service provides Tomcat as 
the default Java web application server. The web application server must be operational and 
available when you run the installation program. 


For a complete list of supported web application servers, see the Product Availability Matrix 
(PAM) 


The Management Console option installs only web applications to a supported Java web 
application server. This option is useful for deploying web applications to nodes in a web 
application server cluster. 

Database Server 


To configure the SAP Data Services repository during installation, set up a database server 
that is operational and accessible when you install SAP Data Services. 


The database server hosts the SAP Data Services repository. 


SAP bundles the SAP Sybase SQL Anywhere database server with the SAP BusinessObjects 
BI platform and Information Platform Services (IPS) installation. To use the bundled 
database, select the option during BI or IPS installation. Data Services also supports the 
following database servers: 


e IBMDB2 
e Microsoft SQL Server 


e MySQL 
e Oracle 
e SAP HANA 


e SAP Sybase Adaptive Server Enterprise 


For a detailed list of supported database versions, revision levels, and requirements, see the 
Product Availability Matrix (PAM). 
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Repositories 27 
Lesson 2 
Datastores 29 
Lesson 3 
File Formats 32 
Lesson 4 
Jobs and Data Flows 34 


UNIT OBJECTIVES 

e Create a local repository 
e Create a datastore 

e Create a file format 


e Create and execute jobs 
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Repositories 


LESSON OBJECTIVES 


After completing this lesson, you will be able to: 


e Create a local repository 
Repository Types 


Purpose 


Central (Optional) Performs source control 
and versioning across multiple local 
repositories 


{ tores outo rom the 





Figure 16: SAP Data Services Repository Types 


Central repositories are NOT "production" repositories. They do not connect to job servers. 
They are for code sharing among multiple ETL developers (each with their own local 
repositories). Central repositories have some of the features of dedicated source control or 
version management applications. However, if your organization has an existing file-based 
code source management application, that might be sufficient. The primary advantage of 
implementing central repositories is the fact that they are integrated (and thus more 
accessible) with other SAP Data Services components (such as Designer and Management 
Console). 


Data Services Repository Configuration 


T #2 l Application = 
Add Repository tables Repository Manager 


Soli l y < i it 


Set Repository Security Central Management Console 


Test/Utilize Repository Data Services Designer 
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Datastores 


LESSON OBJECTIVES 
After completing this lesson, you will be able to: 


Create a datastore 


Datastores 


Datastores represent connection configurations between the software and databases or applications. 
These configurations can be direct or through adapters. Datastore configurations allow the software 
to access metadata from a database or application and read from or write to that database or 
application while the software executes a job. 
SAP Data Services datastores can connect to: 
Databases and mainframe file systems 
Applications that have pre-packaged or user-written adapters 
J.D. Edwards One World and J.D. Edwards World, Oracle Applications, PeopleSoft, SAP 
Applications, SAP Data Quality Management, microservices for location data, SAP NetWeaver 
BW, Siebel Applications, and Google BigQuery. For more information, see the appropriate 


supplement guide. 


Remote servers using FTP, SFTP, and SCP 


& Figure 18: Datastores 





The specific information that a datastore object can access depends on the connection 
configuration. When your database or application changes, make corresponding changes in 
the datastore information in the software. The software does not automatically detect the 
new information. 


You can create multiple configurations for a datastore. This allows you to plan ahead for the different 
environments your datastore may be used in and limits the work involved with migrating jobs. For 
example, you can add a set of configurations (DEV, TEST, and PROD) to the same datastore name. 
These connection settings stay with the datastore during export or import. You can group any set of 


datastore configurations into a system configuration. When running or scheduling a job, select a 
system configuration, and thus, the set of datastore configurations for your current environment. 


~~ Figure 19: Datastore Configurations 





Creating multiple configurations for a single datastore allows you to consolidate separate 
datastore connections for similar sources or targets into one source or target datastore with 
multiple configurations. 


Then, you can select a set of configurations that includes the sources and targets you want by 
selecting a system configuration when you execute or schedule the job. The ability to create 
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multiple datastore configurations provides greater ease-of-use for job portability scenarios, 
such as the following: 


e OEM (different databases for design and distribution) 
e Migration (different connections for DEV, TEST, and PROD) 
e Multi-instance (databases with different versions or locales) 


e Multi-user (databases for central and local repositories) 


Database datastores can represent single or multiple connections with: 
» Legacy systems using Attunity Connect 


* HP Vertica, IBM DB2, Informix, Microsoft SQL Server, MySQL, Netezza, Oracle, SAP ASE, SAP Data 
Federator, SAP SQL Anywhere, SAP HANA, SAP Vora, Sybase IQ, and Teradata databases (using native 
connections) 


« Other databases (through ODBC) 
+ A repository, using a memory datastore or persistent cache datastore 


You can create a connection to most of the data sources using the server name instead of the DSN 
(Data Source Name) or TNS (Transparent Network Substrate) name. Server name connections (also 
known as DSN-less and TNS-less connections) eliminate the need to configure the same DSN or 
TNS entries on every machine in a distributed environment. 


i Figure 20: Database Datastores 


Before defining a database datastore, you must get appropriate access privileges to the 
database or file system that the datastore describes. 





For example, to allow the software to use parameterized SQL when reading or writing to DB2 
databases, authorize the user (of the datastore/database) to create, execute, and drop 
stored procedures. If a user is not authorized to create, execute, and drop stored procedures, 
jobs will still run. However, the jobs will produce a warning message and will run less 
efficiently. 


The software allows you to create a database datastore using Memory as the Database type. 
Memory datastores are designed to enhance processing performance of data flows executing in real- 
time jobs. Data (typically small amounts in a real-time job) is stored in memory to provide immediate 
access instead of going to the original source data. 


Memory tables can be used to perform the following functions: 
* Move data between data flows in real-time jobs. By caching intermediate data, the performance of real-time 


jobs with multiple data flows is far better than it would be if files or regular tables were used to store 
intermediate data. For best performance, only use memory tables when processing small quantities of data. 


* Store table data in memory for the duration of a job. By storing table data in memory, the LOOKUP_EXT 
function and other transforms and functions that do not require database operations can access data without 
having to read it from a remote database. 


Figure 21: Memory Datastores 


A datastore normally provides a connection to a database, application, or adapter. By 
contrast, a memory datastore contains memory table schemas saved in the repository. 





Memory tables are schemas that allow you to cache intermediate data. Memory tables can 
cache data from relational database tables and hierarchical data files such as XML messages 
and SAP |Docs (both of which contain nested schemas). 
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Note: 

The lifetime of memory table data is the duration of the job. The data in memory 
tables cannot be shared between different real-time jobs. Support for the use of 
memory tables in batch jobs is not available. 


Various database vendors support one-way communication paths from one database server to 
another. Oracle calls these paths database links. 


In DB2, the one-way communication path from a database server to another database server is 
provided by an information server that allows a set of servers to get data from remote data sources. 
In Microsoft SQL Server, linked servers provide the one-way communication path from one database 
server to another. These solutions allow local users to access data on a remote database, which can 
be on the local or a remote computer and of the same or different database type. 


A Figure 22: Linked Datastores 


For example, a local Oracle database server, called Orders, can store a database link to 
access information in a remote Oracle database, Customers. Users connected to Customers 
however, cannot use the same link to access data in Orders. Users logged into database 
Customers must define a separate link, stored in the data dictionary of database Customers, 
to access data on Orders. 





The software refers to communication paths between databases as database links. The 
datastores in a database link relationship are called linked datastores. The software uses 
linked datastores to enhance Its performance by pushing down operations to a target 
database using a target datastore. 


LESSON SUMMARY 
You should now be able to: 





e Create a datastore 
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File Formats 


LESSON OBJECTIVES 


After completing this lesson, you will be able to: 


e Create a file format 


File Formats 


File Formats 


A file format defines a connection to a file. Therefore, you use a file format to connect to 
source or target data when the data is stored in a file rather than a database table. The object 
library stores file format templates that you use to define specific file formats as sources and 
targets in data flows. 


File format objects can describe files of the following types: 

e Delimited: Characters such as commas or tabs separate each field 

e Fixed width: You specify the column width 

e SAP transport: Use to define data transport objects in SAP application data flows 

e Unstructured text: Use to read one or more files of unstructured text from a directory 


e Unstructured binary: Use to read one or more binary documents from a directory 


Error Type uae 
r Gonverclonienun A field might be defined in the File | 
tor as having a data typ 
s actually varchar 


i TT ; PAN 
ne data encounterec 


th file, the 


Row-format errors _In the case of a fixed-wid 
software identifies a row that does not 
match the expected width value. 


A Figure 23: Flat File Format Error Handling 





In the File Format Editor, the Error Handling set of properties allows you to choose whether or 
not to have the software perform error-handling actions. 


You can have the software perform the following tasks: 
e Check for either of the two types of flat-file source error 
e Write the invalid row(s) to a specified error file 


e Stop processing the source file after reaching a specified number of invalid rows 
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time services then wait for messages from the access server. When the access server 
receives a message, it passes the message to a running real-time service designed to process 
this message type. The real-time service processes the message and returns a response. The 
real-time service continues to listen and process messages on demand until it receives an 
instruction to shut down. 


Example 


pment cycle 


Scheduled Jobs Batch jobs are scheduled. To schedule a job, use the Administrator or use a 
third-party scheduler. When jobs are scheduled by third-party software: 
The job initiates outside of the software. 
The job operates on a batch file (or shell script for UNIX) that has been 
exported from the software. 
When a job is invoked by a third-party scheduler: 
The corresponding Job Server must be running. 
The Designer does not need to be running. 


Data flows extract, transform, and load data. 


Everything that has to do with data, including reading sources, transforming data, and loading 
targets, occurs inside a data flow. The lines connecting objects in a data flow represent the flow of 
data through data transformation steps. 


After you define a data flow, you can add it to a job or work flow. From inside a work flow, a data flow 
can send and receive information to and from other objects through input and output parameters. 


Figure 27: Data Flows 





Be aware that the data you provide gets placed into trace logs, sample reports, and 
repositories (side-effect data), and so on. In other words, your data will find its way into places 
other than output files. 


An embedded data flow is a data flow that is called from inside another data flow. 


Data passes into or out of the embedded data flow from the parent flow through a single source or 
target. 


An embedded data flow can contain any number of sources or targets, but only one input or one 
output can pass data to or from the parent data flow. 


An embedded data flow is a design aid that has no effect on job execution. When the software 
executes the parent data flow, it expands any embedded data flows, optimizes the parent data flow, 
and then executes it. 





Figure 28: Embedded Data Flows 


Use embedded data flows to perform the following tasks: 
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e Simplify data flow display: Group sections of a data flow in embedded data flows to allow 
clearer layout and documentation. 


e Reuse data flow logic: Save logical sections of a data flow so you can use the exact logic in 
other data flows, or provide an easy way to replicate the logic and modify it for other flows. 


e Debug data flow logic: Replicate sections of a data flow as embedded data flows so you can 
execute them independently. 


LESSON SUMMARY 


You should now be able to: 


e Create and execute jobs 
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Scheduling SAP Data Services Jobs AO 
Lesson 2 
Using Server Groups and Distribution Levels Al 
Lesson 3 
Monitoring SAP Data Services Activity 43 


UNIT OBJECTIVES 
e Schedule SAP Data Services jobs 
e Explain server groups and distribution levels 


e Monitor SAP Data Services activity 
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Scheduling SAP Data Services Jobs 


LESSON OBJECTIVES 


After completing this lesson, you will be able to: 


e Schedule SAP Data Services jobs 


Scheduling Jobs 

You can execute jobs as immediate tasks in either the SAP Data Services Designer or in the 
SAP Data Services Management Console. However, you cannot set schedules for jobs from 
the Designer. Scheduling allows you to run jobs automatically on a periodic basis. There are 
three methods for scheduling, each with their own advantages. 


Method Description 


BOE Scheduler If you are eran: SAP BuslnessObiects Business STES platform and you 
want to manage your SAP Data Services job schedules in that application, first 
create a connection to a Central Management Server (CMS), then configure 
the schedule to use that Adaptive Job server. A program object is added to the 
Data Selb folder in the CMS tenn 





A Figure 29: Scheduling Methods 


LESSON SUMMARY 


You should now be able to: 


e Schedule SAP Data Services jobs 
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LESSON OBJECTIVES 


After completing this lesson, you will be able to: 


e Explain server groups and distribution levels 


Server Groups 


You can group job servers on different computers into a logical SAP Data Services component called 
a server group. A server group automatically measures resource availability on each job server in the 
group and distributes scheduled batch iobs to the iob server with the liahtest load at runtime. 


Computer 1 Computer 2 





Figure 30: Server Groups 


All the job servers in an individual server group must be associated with the same repository, 
which must be defined as a default repository. Each computer can only contribute one job 
server to a server group. 


Once you have a server group associated with a local repository, then an additional option 
becomes enabled. When scheduling or executing a job from the repository, you can specify a 
distribution level. Selection made can potentially affect performance and effective use of 
caches. We recommend testing these options with the same job, same amount of data, in 
your test environment to evaluate the impact of distribution level. 


Description 
-ntire job executes on a single | erve 
Data flows within a job can be executed on separate job 


servers (within the same server group) and can take advantage 
of additional memory (up to two gigabytes) for both in-memory 


and pageable cache on another computer 


-unctions and/or transforms with Run a. 


anabled can be executed on separate jot 
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LESSON OBJECTIVES 


After completing this lesson, you will be able to: 


e Monitor SAP Data Services activity 


Management Console 


The Auto Documentation feature provides a convenient and comprehensive way to browse all of the 
objects in a repository and create printed documentation without the need to open the SAP Data 
Services Designer. 


After you create a project, you can use Auto Documentation to quickly create a PDF or Microsoft 
Word file that captures a selection of job, work flow, and/or data flow information including graphical 
representations and key mapping details 


A Figure 32: Auto Documentation 


A user cannot modify objects displayed in Auto Documentation. This "read-only" nature 
makes the feature a valuable way to expose repository objects to users without risking 
accidental changes to the metadata; appropriate for users who need to "Look, but Don't 
Touch!" 





Data Validation dashboard provides graphical depictions that let you evaluate the reliability of your 
target data based on the validation rules you created in your SAP Data Services batch jobs. This 
feedback allows business users to quickly review, assess, and identify potential inconsistencies or 
errors in source data. 


By default, the dashboard displays the percentage of records which failed a validation rule. The 
dashboard may be further configured into Functional Areas and Business Rules. 


Functional Areas are groups of jobs which call data flows containing validation transforms. You can 
create Functional Areas and assign jobs to them in Settings. Each job can only be assigned to one 
Functional Area (at a time). 


Business Rules are groups of validation transform outputs. The available priority settings for 
Business Rules are High, Medium, and Low. This priority visually reflects the degree of importance 
placed on failures of the rule. Each Business Rule can only be assigned to one Functional Area. 
Each validation transform can only be assigned to one Business Rule. 


& Figure 33: Data Validation Dashboard 


To enable data validation statistics collection for your reports, you must verify two options- 
one at the validation transform level and one at the job execution level 





In Designer, navigate to the validation transforms from which you want to collect data 
validation statistics for your reports. For the columns that have been enabled for validation, in 
the transform editor click the Validation transform options tab and select the checkbox Collect 
data validation statistics. 
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1. Which of the following statements is true of a data flow? 


Choose the correct answer. 


[| A Itruns the entire job on a single job server. 
B Itcan take advantage of additional memory. 
a C Itisafunction that can run as a separate process. 


a D It must be executed from a repository. 


2. Towhich of the following formats can the Auto Documentation feature be output? 


Choose the correct answers. 


A Microsoft Word 


B PDF 
[| C Text 


a D Microsoft Excel 


3. Which of the following can be used to schedule jobs? 


Choose the correct answer. 
[| A Workbench 
[| B Server Manager 


a C Designer 


D Batch file 
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Lesson 2 
Security for SAP Data Services Objects 51 


UNIT OBJECTIVES 
e Explain the SAP BI Platform security model 


e Explain security for SAP Data Services objects 
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SAP BI Platform Security Model 


LESSON OBJECTIVES 


After completing this lesson, you will be able to: 


e Explain the SAP BI Platform security model 


SAP BI Platform Security 


Licensing 


The SAP BusinessObjects Business Intelligence (BI) platform supports two license types: 


e Limited by concurrent session pool 


Designed for users who need occasional access to the BI platform. This license specifies 
how many users can be connected to the BI platform at any given time. 


e Guaranteed - named user 


Designed for users who require access to BI platform regardless of the number of other 
people who are currently connected. 


The license type grants and restricts access to particular tasks and applications. Depending 
on which license you have, you may be unable to access some applications, create content, or 
add documents to the repository. 


Note: 


Choose License Key in the CMC for more information on your licensing scheme. 





User Account Properties 


After a user account has been created, you can modify the account properties. The following 
are some of the properties that can be modified: 


e Account Name 


The account name is the unique identifier for a user account and is the user name entered 
when logging into the BI platform. 


« Full Name 


This optional field is used to capture the user's full name. We recommend that you use this 
field, particularly when managing many users. 


e E-mail 
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This optional field is used to add the user's e-mail address and is used for reference only. 
For example, if the user forgets their password sometime in the future, you can retrieve 
their e-mail address from this field to send them their password. 


e Description 


This optional field is used to add information about the user, such as their position, 
department, or geographic location. 


e Enterprise Password Settings 


User password Settings allow you to change the password and password settings for the 
user. 


Global password settings can be configured in the Authentication area of the Central 
Management Console. 


e Database Credentials 


When a database credential is added to an account, database credentials are enabled in 
the user's profile. 


e Access Type 


This option specifies how the user connects to the BI platform based on the license 
agreement. 


e Attribute Binding 
Account is disabled 


This checkbox allows the administrator to deactivate the user account, instead of 
permanently deleting the account. This option is useful when administering users who will 
be temporarily denied system access, such as employees taking parental leave. 


Select the Account is disabled checkbox to disable the Guest account and make it 
unavailable for use. 


e Alias 


If a user has multiple accounts within the BI platform, use this feature to link the accounts. 
This results in the user having multiple BI platform logon credentials that map to one BI 
platform account. 


You can also use the New Alias button to create a new alias. 


Groups 


After creating a new group, you can add users, add subgroups, or specify group membership, 
so that the new group is actually a subgroup. Subgroups provide additional levels of 
organization, so they are useful when you set object rights to control users' access to your BI 
platform content. 


It is useful to create subgroups when you need to further classify groups of users. For 
example, users can be grouped by location (such as London), and then further divided 
according to their department. 


Group Hierarchy 


Groups are collections of users who share the same account privileges; therefore, you may 
create groups that are based on department, role, or location. Groups enable you to change 
the rights for users in one place (a group) instead of modifying the rights for each user 
account individually. You can also assign object rights to a group or groups. 


© Copyright. All rights reserved. 49 SAPA 
® 


Unit 5: SAP Data Services Security 


In the Users and Groups area, you can create groups that give a number of people access to 
the report or folder. These groups allow you to make changes in one place instead of 
modifying each user account individually. You can also view several default group accounts 
Summarized in the table below.. 


E LESSON SUMMARY 
You should now be able to: 


e Explain the SAP BI Platform security model 
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LESSON OBJECTIVES 


After completing this lesson, you will be able to: 


e Explain security for SAP Data Services objects 
Default Security Groups 


Description 
SE [elaine 
Data Services Designer Users Designer 


Data Services MU Users Mana 


Data Services Profiler Administrator | 
Users Manac 


ale SFY >S F er 


i Figure 36: SAP Data Services Security Groups 


The installation program for SAP Data Services adds several security groups to the IPS (or 
SAP BI Platform). You can utilize these default groups by assigning users to them, or create 
your own custom groups, or a combination of both. In any case, it is advisable to become 
familiar with the rights assigned to these default groups. This knowledge will help you 
understand the SAP BusinessObjects security model and plan security in your deployment. 





Access to Auto Documentation 


Aadministrato ' 


lew Data Qualib sample data 


S 0 


‘ 


Manage batch job histo 
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By default the Data Services Administrator Users group is assigned Full Control (all rights 
Granted) to all Data Services objects 


Full Control 


arvices | 


i Figure 38: SAp Data Services Security - Default Repository Rights 


By default the Data Services Designer Users group is NOT granted rights to any specific 
repositories. 


LESSON SUMMARY 





You should now be able to: 


e Explain security for SAP Data Services objects 
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LESSON OBJECTIVES 


After completing this lesson, you will be able to: 


e Explain file-based transfer 


File-based Transfer 


When operating with multiple developers and/or environments (such as DEV, TEST, QA, PROD), you 
need to move objects among different local repositories. Typically, each developer maintains their 
own local repository (perhaps multiple ones). 

The following methods can be used for object migration: 


File Export/Import 


Central Repositories 


Figure 39: Multi-user Development and Migration 


Note: 

Concurrent logins to the same local repository via Designer are NOT supported. A 
strong warning displays when this happens. Usually this is as a result of a prior 
abnormal exit from Designer while connected to the repository. 





The export feature gives you the flexibility to manage and migrate projects involving multiple 
developers and different execution environments. When you export a job from a development 
repository to a production repository, you can change the properties of objects being 
exported to match your production environment. 


In particular, you can change datastore definitions—application and database locations and 
login information—to reflect production sources and targets. 


You can export objects to another repository or a flat file (.atl or .xml). If the destination is 
another repository, you must be able to connect to and have write permission for that 
repository, and your repository versions must match 


If you choose a file as the export destination, Data Services does not provide options to 
change environment-specific information. 


You can also export an entire repository to a file. When you export or import a repository, jobs 
and their schedules (created in SAP Data Services) are automatically exported or imported as 
well. Schedules cannot be exported or imported without an associated job and its repository. 
To import, simply log in to the destination local repository via Designer and then import the 
previously exported file. 
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Importing objects or an entire repository from a file overwrites existing objects with the same 
names in the destination repository. 


LESSON SUMMARY 
You should now be able to: 





e Explain file-based transfer 
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LESSON OBJECTIVES 


After completing this lesson, you will be able to: 


e Use the central repository 


Central Repositories 


A central repository stores the team copy of an SAP Data Services object and acts as a source 
control for multi user environments. 


A central repository contains all of the information that is in a local repository, such as object 
definitions, datastores, and so on. The central repository is a storage location for this information. To 
change the object, you check out objects from the central repository to your local repository. 


When you simply want to use (but NOT modify) an object from a central repository in your local 
repository, you copy it from central to local. In other words, only check out the objects which you 
need to modify directly. For example, do not check out a job if you only plan to makes changes to a 
data flow inside the job. Check out the data flow instead. 


& Figure 40: Central versus Local Repositories 


You can compare versions of objects in the central and local repositories with the Difference 
Viewer. Right-click on the object and choose Compare Object to Central. 





Check Out/Check In 


Multiple users working from unique local repositories can connect to the same central repository. 
These users can work on the same application and share their work. However, at any given time only 
one user can check out and change a particular object. While an object is checked out to one user, 
other users can get or copy the latest version of the object. However, other users cannot change the 
object in the central repository. 


Object History 


The central repository tracks all versions of an object and the dates and times the software saved the 
version of the object. The Central Object Library in Designer displays a brief history of an item in the 
Latest Version column. For more detailed information, select the Show History menu icon in the 
Central Object Library. 


Figure 41: Central Repository Features 
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When created in the Repository Manager, you can choose whether to Enable Security. If you do not, the 
repository is labeled “non-secure’”. This status is often misunderstood. A “non-secure” central repository is 
still has BI Platform (CMC) and database security on it. Making a secure central repository adds an 
additional layer of flexibility when assigning permissions to objects. With this flexibility comes a bit more 
complexity in management, unfortunately. 


lf a user group is granted rights (in CMC) to a “non-secure" central repository, they have full access to all 
objects in it. With “secure” central repositories, the following permissions can be assigned for users and 
groups to individual objects in the repository: 


Permission Description 


| is 5 er es EVE ~ SE s E T =| SSIDIE 
Read Users can only get a copy of the object from the central repository or 
compare objects between their local and central object libraries. 


i Figure 42: Central Repository Security 


When an authenticated user adds a new object to a secure central repository, the user's 
current group receives Full permissions to the object. All other groups receive Read 
permissions. Members of the group with Full permissions can change the other groups' 
permissions for that object. 


T LESSON SUMMARY 
You should now be able to: 


e Use the central repository 
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Learning Assessment - Answers 


1. Which flat file types can you use to export objects to another repository? 


Choose the correct answers. 


| | A DSF 
B XML 
C ATL 
| | D BIAR 


2. Where can the Object History be found? 
Choose the correct answer. 
A Inthe Central Object Library in Designer. 
[| B Using the CMC under Folders then selecting the object. 
[| C Inthe Data Services Workbench using the Wizard. 


a D Using the Data Services Repository Manager viewing the repository. 


3. What are the two methods for object migration? 
Choose the correct answers. 
A Central repositories 
[| B Import Wizard 


a C Upgrade Management Tool 


D File export/import 
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Lesson 1 


ETL Design for Performance 


UNIT OBJECTIVES 


e Design ETL for performance 
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ETL Design for Performance 


LESSON OBJECTIVES 
After completing this lesson, you will be able to: 


e Design ETL for performance 


Performance 


Distributed Data Flow Execution 


The software provides capabilities to distribute CPU-intensive and memory-intensive data 
processing work (such as join, grouping, table comparison and lookups) across multiple 
processes and computers. 


This work distribution provides the following potential benefits: 


e Better memory management by taking advantage of more CPU resources and physical 
memory 


e Better job performance and scalability by using concurrent sub data flow execution to take 
advantage of grid computing 


You can create sub data flows so that the software does not need to process the entire data 
flow in memory at one time. You can also distribute the sub data flows to different job servers 
within a server group to use additional memory and CPU resources. 


Use the following features to split a data flow into multiple sub data flows: 
Enable Run as a separate process option on resource-intensive operations that include the 
following: 
Query operations that are CPU-intensive and memory-intensive: 
aJoin 
aGROUP BY 
sORDER BY 
=DISTINCT 
*Many other transforms, including: 


Table Comparison, Hierarchy Flattening, Associate, Address Cleanse, Match 
* The following functions: Lookup_ext, Count_distinct,Search_replace 
If you select the Run as a separate process option for multiple operations in a data flow, 
the software splits the data flow into smaller sub data flows that use separate resources 
(memory and computer) from each other. 
When you specify multiple Run as a separate process options, the sub data flow processes run in 
parallel. 


| = Figure 43: Sub Data Flows 


With the Data_Transfer transform, the software does not need to process the entire data flow 
on the job server computer. Instead, the Data_Transfer transform can push down the 
processing of a resource-intensive operation to the database server. This transform splits the 
data flow into two sub data flows and transfers the data to a table in the database server to 
enable the software to push down the operation. 
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From the information in the data flow specification, the software produces output while optimizing performance. 
For example, for SQL sources and targets, the software creates database-specific SQL statements based on a 
job's data flow diagrams. To optimize performance, the software pushes down as many transform operations as 
possible to the source or target database and combines as many operations as possible into one request to the 
database. For example, the software tries to push down joins and function evaluations. By pushing down 
operations to the database, the software reduces the number of rows and operations that the engine must 
process, 

Data flow design influences the number of operations that the software can push to the source or target database. 
Before running a job, you can examine the SQL that the software generates and alter your design to produce the 
most efficient results. 


Figure 44: Push Down Operations 


Partial Push down When a full push-down operation is not possible, SAP Data Services still attempts to 
push down these operations: 
Aggregations (Group By), distinct rows, filtering, ordering, and joins (if same datastore 
or linked) l 


Figure 45: Push Down Scenarios 


Hint: 

The software can only push operations supported by the DBMS down to that 
DBMS. Therefore, for best performance, try not to intersperse SAP Data 
Services transforms among operations that can be pushed down to the 
database. 


You can improve the performance of data transformations that occur in memory by caching as much data as 
possible. By caching data, you limit the number of times the system must access the database. 


Use this formula to help decide if caching might be appropriate: 


Available Job Server Memory >= (# of Rows) * (# of Columns) * 26 
(the factor of 26 assumes an average column value of 20 bytes with approximately 30% for the Job Server 
process itself) 


; Figure 46: Caching Data 


SAP Data Services provides the following types of caches that your data flow can use for all of 
the operations it contains: 





e |n-memory 


Use in-memory cache when your data flow processes a small amount of data that fits in 
memory. 


e Pageable cache 
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Lesson: ETL Design for Performance 


Use pageable cache when your data flow processes a very large amount of data that does 
not fit in memory. When memory-intensive operations (such as Group By and Order By) 
exceed available memory, the software uses pageable cache to complete the operation. 


Pageable cache is the default cache type. To change the cache type, use the Cache type 
option on the data flow Properties window. 


When sources are joined using the Query transform, the following table shows the relationship 
between cache settings in the source, Query editor, and whether the software will load the data in 
the source table into cache. 


Source Editor Query Editor Effective Cache 
Cache Setting Cache Setting 


No Automatic NO 





Figure 47: Caching Joins 


Cache setting in the Query transform takes precedence over the setting in the source. In the 
Query editor, the cache setting is set to Automatic by default. The automatic setting carries 
forward the setting from the source table. 


SAP Data Services has three lookup functions: lookup, lookup_seq, and lookup ext. 
The lookup and lookup_ext functions have cache options. Caching lookup sources improves 
performance because the software avoids the expensive task of creating a database query or full 
file scan on each row. 


Cache Option Description 


PRE_LOAD_CACHE Preloads the result column and compare column into 
memory (it loads the values before executing the 


lookup). 


Figure 48: Caching Lookups 


Demand-load caching of lookup values is helpful when the lookup result is the same value 
multiple times. Each time the software cannot find the value in the cache, it must make a new 
request to the database for that value. Even if the value is invalid, the software has no way of 
knowing if it is missing or just has not been cached yet. 





When there are many values and some values might be missing, demand-load caching is 
significantly less efficient than caching the entire source. 
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Persistent cache datastores provide the following benefits for data flows that process large volumes of data: 


= You can store a large amount of data in persistent cache which SAP Data Services quickly pages into memory 
each time the job executes. For example, you can access a lookup table or comparison table locally (instead of 
reading from a remote database). 


You can create cache tables that multiple data flows can share (unlike a memory table which cannot be shared 
between different real-time jobs). For example, if a large lookup table used in a lookup_ext function rarely 
changes, you can create a cache once and subsequent jobs can use this cache instead of creating it each 
time. 


Persistent cache tables can cache data from relational database tables and files. 
A Figure 49: Persistent Cache 


You create a persistent cache table by loading data into the persistent cache target table 
using one data flow. You can then subsequently read from the cache table in another data 
flow. When you load data into a persistent cache table, SAP Data Services always truncates 
and recreates the table. 





You can set SAP Data Services to perform data extraction, transformation, and loads in parallel by setting parallel 
options for sources, transforms, and targets. In addition, you can set individual data flows and work flows to run in 
parallel by simply not connecting them in the workspace. If the Job Server is running on a multi-processor 
computer, it takes full advantage of available CPUs. 


Figure 50: Parallel Execution 


Parallel engine processes execute the parallel data flow processes. Note that if you have more 
than eight CPUs on your job server computer, you can increase the maximum number of 
engine processes to improve performance. To change the maximum number of parallel 
engine processes, use the job server options (Tools— Options— Job Server Environment ). 





For more information, search the Performance Optimization Guide for "Table Partitioning" 
and "Degree of Parallelism”. 


Performance of SAP Data Services can be also be improved with these features: 
* Source 

» Join Rank 

» Array Fetch Size 
* Target 


» Bulk Loading 
» Number of Loaders 
» Rows per Commit 


Figure 51: Additional Performance Techniques 


For more information, see the Performance Optimization Guide. 


a LESSON SUMMARY 
You should now be able to: 


e Design ETL for performance 
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Learning Assessment - Answers 


1. Which of the following offers performance improvements? 


Choose the correct answers. 


A Bulk loading 


[| B Central repository 


a C Information Steward 


D Join rank 


2. Apersistent cache can only cache data from database tables. 


Determine whether this statement is true or false. 


a True 
False 


3. Why use push down operations? 


Choose the correct answer. 


A To influence the number of operations to the database. 


a B To enable a job to be scheduled in the CMC. 


[| C To allow users and groups to access your job in the Designer. 


[| D To enhance the use of create and drop tables in the data source. 
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